Syntia: Synthesizing the Semantics of Obfuscated Code
نویسندگان
چکیده
Current state-of-the-art deobfuscation approaches operate on instruction traces and use a mixed approach of symbolic execution and taint analysis; two techniques that require precise analysis of the underlying code. However, recent research has shown that both techniques can easily be thwarted by specific transformations. As program synthesis can synthesize code of arbitrary code complexity, it is only limited by the complexity of the underlying code’s semantic. In our work, we propose a generic approach for automated code deobfuscation using program synthesis guided by Monte Carlo Tree Search (MCTS). Specifically, our prototype implementation, Syntia, simplifies execution traces by dividing them into distinct trace windows whose semantics are then “learned” by the synthesis. To demonstrate the practical feasibility of our approach, we automatically learn the semantics of 489 out of 500 random expressions obfuscated via Mixed Boolean-Arithmetic. Furthermore, we synthesize the semantics of arithmetic instruction handlers in two state-of-the art commercial virtualization-based obfuscators (VMProtect and Themida) with a success rate of more than 94%. Finally, to substantiate our claim that the approach is generic and applicable to different use cases, we show that Syntia can also automatically learn the semantics of ROP gadgets.
منابع مشابه
Behavioral Analysis of Obfuscated Code
Classically, the procedure for reverse engineering binary code is to use a disassembler and to manually reconstruct the logic of the original program. Unfortunately, this is not always practical as obfuscation can make the binary extremely large by overcomplicating the program logic or adding bogus code. We present a novel approach, based on extracting semantic information by analyzing the beha...
متن کاملGuard Reasoning for CHR Optimization
Constraint Handling Rules (CHR) is a high-level language commonly used to write constraint solvers. Most CHR programs depend on the refined operational semantics, resulting in an obfuscated logical reading and potential misbehavior under the theoretical operational semantics. We introduce two source to source transformations: guard simplification and occurrence subsumption. By removing redundan...
متن کاملTowards Efficient Code Synthesis from Statecharts
This paper describes a strategy for synthesizing efficient code from UML statecharts based on SMDL, an intermediate language with formal operational semantics. We use an intermediate language to support semantic variations in UML models and different target programming languages. SMDLmodels are implemented using Software Graphs that can be reduced to generated optimized code.
متن کاملObfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
We identify obfuscated gradients as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat optimization-based attacks, we find defenses relying on this effect can be circumvented. For each of the three types of obfuscated gradients we discover, we describe indicators of defenses exhibiting th...
متن کاملPROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA TESE DE DOUTORADO “Context-Sensitive Analysis of x86 Obfuscated Executables”
A code obfuscation intends to confuse a program in order to make it more difficult to understand while preserving its functionality. Programs may be obfuscated to protect intellectual property and to increase security of code. Programs may also be obfuscated to hide malicious behavior and to evade detection by anti-virus scanners. We introduce a method for context-sensitive analysis of binaries...
متن کامل